PAC Nearest Neighbor Queries: Using the Distance Distribution for Searching in High-Dimensional Metric Spaces

نویسندگان

  • Paolo Ciaccia
  • Marco Patella
چکیده

In this paper we introduce a new paradigm for similarity search, called PAC-NN (probably approximately correct nearest neighbor) queries, aiming to break the “dimensionality curse” which inhibits current approaches to be applied in high-dimensional spaces. PAC-NN queries return, with probability at least 1− δ, a (1+ )-approximate NN – an object whose distance from the query q is less than (1 + ) times the distance between q and its NN. We describe how the distance distribution of the query object can be used to determine a suitable stopping condition with probabilistic guarantees on the quality of the result, and then analyze performance of both sequential and index-based PAC-NN algorithms. This shows that PAC-NN queries can be efficiently processed even on very high-dimensional spaces and that control can be exerted in order to tradeoff between the accuracy of the result and the cost.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using the Distance Distribution for Approximate Similarity Queries in High-Dimensional Metric Spaces

We investigate the problem of approximate similarity (nearest neighbor) search in high-dimensional metric spaces, and describe how the distance distribution of the query object can be exploited so as to provide probabilistic guarantees on the quality of the result. This leads to a new paradigm for similarity search, called PAC-NN (probably approximately correct nearest neighbor) queries, aiming...

متن کامل

PAC Nearest Neighbor Queries: Approximate and Controlled Search in High-Dimensional and Metric Spaces

In high-dimensional and complex metric spaces, determining the nearest neighbor (NN) of a query object q can be a very expensive task, because of the poor partitioning operated by index structures – the so-called “curse of dimensionality”. This also affects approximately correct (AC) algorithms, which return as result a point whose distance from q is less than (1 + ) times the distance between ...

متن کامل

Using the k-Nearest Neighbor Graph for Proximity Searching in Metric Spaces

Proximity searching consists in retrieving from a database, objects that are close to a query. For this type of searching problem, the most general model is the metric space, where proximity is defined in terms of a distance function. A solution for this problem consists in building an offline index to quickly satisfy online queries. The ultimate goal is to use as few distance computations as p...

متن کامل

The Analysis of a Probabilistic Approach to Nearest Neighbor Searching

Given a set S of n data points in some metric space. Given a query point q in this space, a nearest neighbor query asks for the nearest point of S to q. Throughout we will assume that the space is real d-dimensional space <d, and the metric is Euclidean distance. The goal is to preprocess S into a data structure so that such queries can be answered efficiently. Nearest neighbor searching has ap...

متن کامل

Nearest-Neighbor Searching and Metric Space Dimensions

Given a set S of points in a metric space with distance function D, the nearest-neighbor searching problem is to build a data structure for S so that for an input query point q, the point s ∈ S that minimizes D(s, q) can be found quickly. We survey approaches to this problem, and its relation to concepts of metric space dimension. Several measures of dimension can be estimated using nearest-nei...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999